MICF: An effective sanitization algorithm for hiding sensitive patterns on data mining
نویسندگان
چکیده
Data mining mechanisms have widely been applied in various businesses and manufacturing companies across many industry sectors. Sharing data or sharing mined rules has become a trend among business partnerships, as it is perceived to be a mutually benefit way of increasing productivity for all parties involved. Nevertheless, this has also increased the risk of unexpected information leaks when releasing data. To conceal restrictive itemsets (patterns) contained in the source database, a sanitization process transforms the source database into a released database that the counterpart cannot extract sensitive rules from. The transformed result also conceals non-restrictive information as an unwanted event, called a side effect or the “misses cost.” The problem of finding an optimal sanitization method, which conceals all restrictive itemsets but minimizes the misses cost, is NP-hard. To address this challenging problem, this study proposes the Maximum Item Conflict First (MICF) algorithm. Experimental results demonstrate that the proposed method is effective, has a low sanitization rate, and can generally achieve a significantly lower misses cost than those achieved by the MinFIA, MaxFIA, IGA and Algo2b methods in several real and artificial datasets.
منابع مشابه
Data sanitization in association rule mining based on impact factor
Data sanitization is a process that is used to promote the sharing of transactional databases among organizations and businesses, it alleviates concerns for individuals and organizations regarding the disclosure of sensitive patterns. It transforms the source database into a released database so that counterparts cannot discover the sensitive patterns and so data confidentiality is preserved ag...
متن کاملIntroducing an algorithm for use to hide sensitive association rules through perturb technique
Due to the rapid growth of data mining technology, obtaining private data on users through this technology becomes easier. Association Rules Mining is one of the data mining techniques to extract useful patterns in the form of association rules. One of the main problems in applying this technique on databases is the disclosure of sensitive data by endangering security and privacy. Hiding the as...
متن کاملPrivacy Preserving Frequent Itemset Mining by Reducing Sensitive Items Frequency using GA
Frequent Itemset mining extracts novel and useful knowledge from large repositories of data and this knowledge is useful for effective analysis and decision making in telecommunication networks, marketing, medical analysis, website linkages, financial transactions, advertising and other applications. The misuse of these techniques may lead to disclosure of sensitive information. Motivated by th...
متن کاملEfficient sanitization of informative association rules
Recent development in Privacy-Preserving Data Mining has proposed many efficient and practical techniques for hiding sensitive patterns or information from been discovered by data mining algorithms. In hiding association rules, current approaches require hidden rules or patterns to be given in advance. In addition, for Apriori algorithm based techniques [26], multiple scanning of the entire dat...
متن کاملConcealing Sequential and Spatiotemporal Patterns using Polynomial Sanitization
Earlier, Process of relevant pattern observation which is present in the database observed as a hurdle for database protection. Over the time, various approaches for hiding knowledge have emerged, mainly in the focus of Association rules and frequent item sets mining. This paper, have seen the problem in different view i.e., Knowledge hiding to the context where the data and extracted knowledge...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Advanced Engineering Informatics
دوره 21 شماره
صفحات -
تاریخ انتشار 2007